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^ ! Abstract 
> 

I/"") , We formulate quantum rate-distortion theory in the most general setting where classical 

side information is included in the tradeoff. Using a natural distortion measure based on 
entanglement fidelity and specializing to the case of an unrestricted classical side channel, 
we find the exact quantum rate-distortion function for a source of isotropic qubits. An upper 
bound we believe to be exact is found in the case of biased sources. We establish that in this 
scenario optimal rate-distortion codes produce no entropy exchange with the environment of 
"*^h , any individual qubit. 

Oh" 

Key words: Entanglement, entanglement fidelity, quantum information theory, quantum 
£^ ■ rate-distortion theory, qubit, rate-distortion theory, source coding. 

1 Introduction 



The quantum lossless source coding theorem specifies the minimum rate, called the entropy and 
measured in code qubits per source qubit, to which a quantum source can be compressed subject 
to the requirement that the source qubits can be recovered perfectly from the code qubits. In 
realistic applications we may be able to tolerate imperfect recovery of the source qubits at the 
receiver, in which case we would seek to minimize the rate required to achieve a specified level 
of distortion. Equivalently, we may be required to use a rate R less than the entropy of the 
source, in which case we would seek to minimize the distortion subject to this rate constraint. 
Here, the distortion measure is a user-defined function of the input and the reconstruction the 
precise form of which depends on the nature of the application. 

Analysis of the tradeoff between rate and distortion is the subject matter of rate- distortion 
theory. Classical rate-distortion theory EJ is an important and fertile area in information theory. 
Considering that coding theorems for both noiseless H and noisy Q quantum channels have 
been established some years ago, it is surprising that little effort has been put into developing 
quantum rate-distortion theory. The purpose of this paper is to fill that gap. 
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To be completely general one must allow for a classical side channel containing information 
gathered while manipulating the source qubits, and include the corresponding classical rate r, 
measured in bits per source qubit, in the tradeoff. It has been shown in [|j that at zero distortion 
no classical side information can help reduce the quantum rate below the von Neumann entropy 
of the source. This turns out not to be the case for positive distortion d. Therefore one must 
speak of a 2-dimensional tradeoff manifold R(d, r). Here we introduce this general formulation for 
the first time. However, we focus mainly on the scenario of unrestricted classical side information, 
i.e. r = oo, and refer to R(d) = R(d, co) as the rate-distortion function. This clearly provides a 
lower bound on achievable R for the same distortion d but restricted classical rate r. 
In classical information theory the rate-distortion function has the simple form 

R(D)= min I(X;Y), (1) 

Y:E XtY d(X,Y)<D 

where X is a random variable distributed like a typical source letter, Y is a random variable 
jointly distributed with X that is used to construct approximations to the source output and 
ranges over an alphabet possibly different from the source alphabet, Ex,y denotes expectation 
with respect to the joint distribution of X and Y, d(-, •) is a suitably defined distortion function, 
and I(X; Y) is the average mutual information between X and Y. The relevant information-like 
quantity playing the role in the quantum channel capacity formula is the coherent information 
I c (p,£) H to be defined in the next section. The natural first impulse is to assume that the 
same quantity should appear in quantum rate-distortion theory. Indeed, Barnum || has derived 
a lower bound on R(d, 0) based on coherent information. This bound is far from tight, however, 
since the coherent information often is negative for distortions considerably smaller that that 
which can be achieved with the receiver is sent no qubits at all. (A comparable problem does 
not occur in channel capacity calculations because the maximization procedure invoked there 
ensures positivity.) In view of this we pursue quantum rate-distortion from first principles using 
a natural distortion measure based on entanglement fidelity that was introduced in ||. 

We define the problem in Section 2, wherein we also provide relevant background on quantum 
operations, entropies and fidelity measures. In Section 3 we find the rate-distortion function for 
a restricted class of coding procedures; in Section 4 we argue that the optimum coding scheme 
belongs to this class. Section 5 describes a simple physical realization of the optimal coding 
procedure. Speculations are left for the final section. 



2 Definitions 

Let us recall some basic definitions of quantum information theory Q, ||. A general quantum 
information source is described by a density matrix p® of a quantum system Q. This density 
matrix may result from the system being prepared in certain pure states with respective prob- 
abilities. Alternatively, we may view our quantum system Q as a part of a larger system RQ 
which includes a reference system R which always may be constructed such that the overall 
state is pure and results from restricting to Q, i.e., 

pQ = tr R (\y RQ )(y RQ \) . (2) 
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Next consider a quantum process acting on the source p® 

Q Q _^ £( Q) = (31 
P )- tr(£(pQ)Y [6) 

with a general quantum operation £ of the form 

f(^) = ^^4. (4) 
i=i 

Note that the action of £ is completely determined by the set of operation elements {Ai}. A 
useful way to think about the quantum process is to embed RQ into an even larger space RQE 
by adding an environment E, initially in a pure state \s) and hence decoupled from RQ. Then 
a well-known representation theorem Q, || states that a general quantum process £ may be 
realized by performing a unitary transformation U® E entangling Q and E, followed by projecting 
via P E onto the environment alone, and then tracing out R and E; i.e., 

£{ p Q) = c tr RE {P E U QE (\y RQ )(y RQ \ (» \s)(s\)U QE ^P E ), (5) 

where c is a positive constant. Although the theorem refers to a mathematical construction, it 
provides physical insight. For instance, it enables one to define the entropy exchange |7]], Q 

S e (pQ,£) = S(p E ') = S(p R V) (6) 

Here S(o~) = —tr(a log 2 o~) is the von Neumann entropy and p E ' and p R ®' denote the states of E 
and RQ, respectively, after the operation. The equality in (g) holds because the whole system 
RQE remains in a pure state after the process, as a consequence of which S e (p®,£) measures 
the amount of "disorder", or "noise", introduced into the system RQ by virtue of its having 
become entangled with E, and vice versa. 

A convenient expression in terms of the original operation elements {At} is 

S e (p Q ,£) = S(W) = -tr{W \og 2 W) (7) 

with 

tr(A iP QA\) 

Observe that if there is only one operation element (or, equivalently, if they are all the same), 
then the entropy exchange is zero. The noise interpretation of S e is also evident from the formula 
for coherent information, 

I c (p Q ,£) = S(£(p Q ))-S e (p Q ,£), (9) 



that appears in the channel capacity formula. Comparing I c (p^,£) to its classical counterpart 
I(X;Y) = H(Y) — H(Y\X), we see that S e (p®,£) plays a role analogous to the noise term, 
H(Y\X). 
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We end this brief review with the definition of entanglement fidelity, denoted by F e (p^ ,£) and 
defined by 

F e (p Q ,£) = {V RQ \(I R ®£)(\V RQ ){y RQ \)\y RQ ). (10) 

The entanglement fidelity tells us how well the system's state and the system's entanglement 
with its surroundings R, which do not participate directly in the quantum process, are preserved 
under the operation in question. Like any meaningful quantity it has an expression which is 
manifestly independent of which purification R is employed, namely 

p {p Q 8 ) - ^l^ Q )' 2 (11) 



We now augment Barnum's formulation of the r = case || to allow for classical side informa- 
tion. First we restrict attention to i.i.d. sources with density matrix p, so that p( n ) = p® n . An 
(n, R, r) rate- distortion code consists of an encoding operation from the source space p^ 
to a block of \nR\ qubits and \nr\ bits (henceforth abbreviated to nR and nr respectively), 
and a decoding operation 2?( n ) acting in the reverse direction. Here R < 1, so in effect we are 
compressing the n source qubits to nR qubits and then decompressing them back to n qubits 
with the help of nr bits of information gathered during the compression phase, in an attempt to 
recover the original with the maximum possible fidelity consistent with the values of R and r. 

For the rate-distortion code (C^ n \T>^) Barnum defines a natural distortion based on entangle- 
ment fidelity, namely 

with T a being the marginal operation on the a-th copy of p induced by the encoding-decoding 
operation, 

T a {a) = tri v .. ia _i iQ+lv .. jn £> (n) o C {n) {p ® p ■ ■ ■ <g> p tg> a <g> p • • • <g> p). (13) 

We say that a rate-distortion triplet (R, r, d) is achievable for a given p iff there exists a sequence 
of (n,R,r) rate-distortion codes (C( n ), £>(")) such that 

l^de(/ n \vWoCW) < d (14) 

Then the rate- distortion manifold R(d, r) is defined as the infimum of all R for which (R, r, d) 
is achievable. 

In the following we approach the problem of finding R(d, r) from first principles. With no loss 
of generality the encoding procedure may be divided into two steps. In the first step the encoder 
manipulates blocks of qubits of size n via some quantum operation S(p^) = J2i=i Aip^ 
For £ to be physical its operation elements {Ai} must satisfy the trace preserving condition 
J2i=x A\Ai = I. Define quantum operations £a 4 {p^) = Aip^A\. A given decomposition {Ai} 
of unity implies that the probability that the non-trace preserving operation £^ i is the one 
that will be performed is \ = tr(£Ai(p^))- Quantum mechanics forbids the encoder to have 
control over which of the k operations will get performed, but afterwards the encoder can obtain 
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information about which one actually took place. This information is embodied in the index 
random variable I taking integer values i, 1 < i < k with respective probabilities Aj. In general 
some or all of this information may be available to the decoder, embodied in the random variable 
J = /(/), a deterministic function of I. Further define 

S = EjS(E IlJ £ Al (p^)) , (15) 

the average output von Neumann entropy conditional on the knowledge of J (i.e. from the 
point of view of somebody who knows the value of J but not the value of /). Given R and r, 
the goal is to choose £ and / so that the distortion is minimized while keeping S < nR and 
H(J) < nr. 

In the second step we take a large number ./V of such blocks, group them according to the 
value of J, and process each group in the standard lossless coding way ||| in order to get a 
string of at most NnR qubits in the limit of large N. The decoding procedure is just reversing 
the second step, which the lossless coding theorem assures us can be done with effectively perfect 
fidelity in the limit as N — > oo (for fixed n), and using the Nnr bits of classical information 
about the values of J for each block so the decoder may unscramble them properly. Finally, the 
rate-distortion function will be achieved in the limit of large n, as well as large N. 

Since the distortion depends only on the operation elements Ai, the choice of / only affects 
the tradeoff between R and r. Using the concavity of von Neumann entropy [10] and the fact 
that EjEj\j = Ej, we have the following inequalities: 

EjS(£ Al (p^)) <S< S(Ej£ Al (p^)) = S(S(pto)) (16) 

The upper bound is attained when / = const, i.e. when no classical side information is 
allowed. The lower bound on S is attained when / is the identity map, in which case H{J) = 
H(I) = — Ya=i M 1°§2 M * s maximum. An intuitive argument for the latter is that, from the 
point of view of the decoder, only single element operations £ Ai have been performed; these in 
turn have zero entropy exchange, which we interpreted as noise. Whenever the decoder lacks 
information about the value of /, the entropy exchange of the block is strictly positive. 

We shall henceforth concentrate on the case of maximal classical rate r, thus reducing the 
problem to finding the tradeoff function R n (d) between S = Ya=1 MS (^Aiip^i^j and the dis- 
tortion d e (p( n \ £) . The rate distortion function is given by the limit R(d) = linin^oo R n (d). In 
the next section we analyze the n = 1 case. Subsequently we demonstrate the perhaps surprising 
fact that n = 1 already attains the R(d) curve. 



3 The rate-distortion function for n = 1 

Let us temporarily restrict attention to k = 1, so that (|j) becomes £ (a) = Ao~A\ and also 
temporarily ignore the trace-preserving constraint. First a technical lemma: 

Lemma 1 Let A and A be positive diagonal matrices whose diagonal elements are given in a 
non-ascending order. Then for any unitary U and V the inequality \tr(UAVA)\ < tr(AA) holds. 
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Proof Consider the Cauchy-Schwartz inequality for the Hilbert-Schmidt inner product ||] 
(A,B) = tr(AB^), namely 

\tr{AB ] )\ 2 < tr(AA ] )tr{BB ] ). (17) 

Since A and A are positive we have A = \J A At and A = V A At. Setting A = y/AVy/X and 
B = yfAU^y/X, we find that 

\tr{UAVA)\ 2 < tr(UAU ] A)tr(VAV ] A), (18) 

so the lemma will hold for general unitary U and V provided it holds when V = W . Next, 
denote the elements of U and diagonal elements of A and A by {tty}, {b~i} and {A,}, respectively. 
Defining the matrix P with elements pij = \Uij\ 2 , we have 

tr{UAU ] A) = UijSjU^Xi = J2Pii 6 3 X i- ( 19 ) 

Since elements of each row and column of P add up to 1, P is a stochastic matrix, and hence a 
convex combination of permutation matrices So the maximum value of tr(U AW A) is equal 
to J2i fifti with 5[ a permutation of the <5j. By Chebyshev's inequality P = I corresponds to the 
optimum permutation; this is especially easy to see for 2x2 matrices for which the ordering 
condition implies (Ai — AaX^i — 82) > 0, or Ai<5i + X2S2 > X1S2 + \2$i- Therefore U = V = I 
maximizes \tr(UAVA)\; the Lemma is proved. 

Theorem 1 For all single qubit quantum operations £a{p) = ApA^ , there exists a quantum 
operation £d{p) = DpD^ with [D,p] = and D positive, of the same output entropy and 
smaller or equal distortion. 

Proof We work in the basis {|0), |1)} in which p is diagonal, so p = po\0){0\ +pi|l)(l| with 
Po + Pi = 1 an d Po > Pi- It is easy to see that any complex matrix A can be expressed as a 
product A = UDp l l 2 V p~ 1 / 2 , where U and V are unitary and D is diagonal positive and hence 
commutes with p. This follows from applying the polar decomposition of any complex matrix 
B, namely B = UAV. Here U and V are unitary, A is diagonal positive with non-ascending 
elements, and we choose B = Ap 1 / 2 and D = Ap~ 1 / 2 . Such a decomposition ensures that 
ApA^ = U(DpD^)U\ so that tr{ApA^) = tr(DpD^) and S(£ A ) = S(£ D ). In addition, since 
both A = Dp 1 / 2 and p 1 / 2 are diagonal positive with non-ascending elements, Lemma 1 asserts 
that |tr (A/?) | < \tr(Dp)\. Combining the above with the single qubit distortion formula 

, , „ * \tr(Ap)\ 2 , . 

we see that the operation £d has the same output entropy but a distortion that is less than or 
equal to that of £a, thus proving the statement of the Theorem. * 
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Fig. 1. Lower bound Si(d) on the single qubit rate-distortion function for p a = 0.5, 0.6, 0.7, 0.8 and 

0.9. 

Since A is denned only up to a multiplicative constant, Theorem 1 implies a complete parametriza- 
tion for the unphysical n = k = 1 curve, which we denote here by Si(d). It is easy to see that 
in the {|0), |1)} basis the matrix 

. / COS * \ . r _ 7T, . . 

A ={ sin* J'^IO,-], (21) 

interpolates between the zero distortion limit A = I, where S = S(p), and the zero entropy limit 
A = |0)(0|, where we replace the source with the pure "best guess" state |0)(0|. 

This curve, easily verified to be convex, is shown in Fig. 1 for several values of po. It is 
parametrized as 

S l (A) = hJ 7 Pod + cos A) \ PQPi(l-sinA) 

\(Po + Pi) + (Po - Pi) cos A/ (po +pi) + (po -pijcosA 

where A £ [0,^]. Here /12(A) = — Alog 2 (A) — (1 — A) log 2 (l — A) is the Shannon binary entropy 
function. When p® = \ the above simplifies to S\{d) = ^2(5 + y/d(\ — d)). S\{d) serves as a 
lower bound for R\{d) since, for any decomposition of unity J2i^l^i = I an d \ = tr(£Ai(p)), 
we have 



k k I k 

i=l ' i=l \i=l 



(23) 



by the convexity of S\ (d) . In the case of po = ^ , due to isotropy this lower bound is attainable 
with k = 2, 

. / cos* \ . / sin* \ „ r _ ir, 

M = n • a ' A 2 = n fl , 9 € 0, - 24 

\ sin* / \ cos* / 4 
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The case po > \ is not as obvious. First we would like to show that k = 2 suffices. We fix A± 
and vary Ai , 2 < i < k . We use Lagrange multipliers and seek the minimum of 

j2HA iP A\)S ( f* P \ ) ~ \HA iP )\ 2 -j^trikAlAA (25) 

i=2 \ tr { A iP A i) J i=2 i=2 

Differentiating S with respect to A{ and A\ and setting this to zero, we obtain an equation 
involving only Ai, Al, fi and A, so evidently a solution is obtained for A2 = ■ ■ ■ = A^. This 
has the same effect as retaining only A2, so k = 2 includes natural solutions to the extremum 
problem. Motivated by the po = \ case, we conjecture that the global minimum is among them. 

Restricting attention to k = 2, we concentrate on the case where A± and A2 are diagonal and 
use the parametrization 

. / cos a \ , I sin a \ . r _, vr 

A i={ cos(a + A) J ' A2 =( sin(a + A) J' A€ [°'2 ] (26) 

and d = 2popi(l — cos A). Here a is function of A such that 

S = j:tr(AMl)s(-^^-) (27) 

is maximized. Differentiating with respect to a, we arrive at 

picos 2 (a + A)\ cos a cos(a + A) 



2p pi sin A ( log 2 

+ log 2 I — 1 



Pocos 2 a I po cos a + p\ cos (a + A) 
pi sin 2 (a + A) \ sinasin(a + A) \ 



Po sin 2 a J po sin a + pi sin(a + A) J 

1 , ... /, / po sin 2 a \ / po cos 2 a \ \ 

+ (p sm2a+pi sm2 a+A ft, 2 o o- t— — /12 ^ ^7 rr = 0, 

\ \po sm z a + pi sm z (a + A) / \ p cos 2 a + pi cos 2 (a + A) / / 



which we solve numerically. The function a(A) is plotted in Fig. 2 for several values of po- 
We also plot the corresponding rate-distortion curves in Fig. 3. The curves are convex and 
approach d max = 2poP\ with zero slope. Note that the Po = \ solution is precisely the one 
obtained previously, namely Si(d). 
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Fig. 2. The function a(A) that solves (g) plotted for p = 0.5, 0.6, 0.7, 0.8 and 0.9. 

Now we show that this diagonal solution is optimal with respect to local perturbations of the 
{Ai}. Recall that we wish to find the optimal tradeoff between S and d = I — J2i \t- r {Aip)\ 2 
under the constraint J2i AlAi = I. Notice that both S and the trace preserving condition are 
invariant under the transformation Ai — > U{Ai where XJ% are unitary matrices. Furthermore 
\tr(UiAip)\ < \tr(Aip)\ when Aip is positive (see Lemma 2 below), and we may always pick J7j 
to achieve this upper bound. This can be seen from the polar decomposition A^p = ViDiWi 
and choosing Ui = (V^Wj) -1 . Therefore we restrict attention to positive Aip and use a new 
parametrization: 

( A cos 9 a: sing \ / /jsinfl xcosO \ 

s*sinfl (l-A)cosfl J,A 2 = fl J° cos0 (l-ffsinfl J ( 2 8) 
po pi / V po pi / 

in terms of and complex x. Here A and p are functions of \x\ determined by the conditions 




A 2 cos 2 ^ + ^ 2 sin 2 = 4-k| 2 (29) 
f 2 

2 

(1 - A) 2 cos 2 9 + (1 - pf sin 2 Q = %- \x\ 2 (30) 

/ 2 

and d = 1 — / 2 . We see from the expansion about x = that A and p are both quadratic in 
\x\. It is also easy to see that the traces and determinants of the A{pA\ (and hence the eigen- 
values) also have no terms linear in x. Expanding to second order about the optimal diagonal 
solution, we verify that S is indeed at a local minimum with respect to varying x. We thus 
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conclude our argument that the n = 1 rate-distortion curves R\{d) are those depicted in Fig. 3. 

1 JT 




Fig. 3. The single qubit rate-distortion function R\{d) plotted for po = 0.5, 0.6, 0.7, 0.8 and 0.9. 



4 The rate-distortion function for general n 

Now we move to the general n case and argue that we cannot do any better than Ri(d). We 
have n qubits with joint density operator p® n , and we consider appropriate combinations of 
quantum operations £A{p® n ) = A{p® n )A^ . We work in the basis B n = {|0), |l)} n with |0) and 
|1) defined as before. In this basis the system operator A is given by 



A = { T „ , (31) 




where the B, K, L and C are 2 n ~ 1 x 2 n ~ l matrices acting on the last n — 1 qubits. It 
is easy to verify that the restriction £ > of £ to the last n — 1 qubits is given by the set 
{^/p Q B, ^/p-^K, y/p x C} of operation elements. 

We first restrict attention to processes with A diagonal in the B n basis. 

Theorem 2 General n-qubit trace-preserving processes with operation elements {Ai} diagonal 
in the B n basis cannot perform below the single qubit rate- distortion curve R\(d). 

Proof We prove the theorem using induction on n. It is true for n = 1 by the results of the 
previous section. Let us now assume it holds for n, and then show its validity for n + 1. We 
work in the B n+l basis where Ai is represented by a 2 n+1 x 2 n+1 dimensional matrix 

Ai = ( ^ _j_ Ci ) (32) 
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with Bi and Cj both diagonal 2 n x 2™ matrices acting on the last n qubits. Then the projection 
of 8 Al onto the last n qubits is S^ip®™) = B iP ® n B\ + C iP ® n C\. We also have from © that 

* « t ( 33 ) 
C iP ® n C\ J v ; 

Then the normalized projection of £ Ai onto the first qubit is 

HM=( Xl ,_ A .) (34) 

where X, = tr(£ B ,(p "))/tr(£ A ,(p " +1 )). 

The average distortion associated with the coding procedure defined by the {A{\ is 

d = — ^— d> + -^ri 1 (35) 

71+1 71+ 1 

where 

d> =Y / H£ Bi (p® n ))d e (p® n ,£ Bi ) +tr(£ Ci (p^ n ))d e (p^ n ,£c i ) (36) 

i 

and 

d 1 = Y,de{p,£ 1 Ai ) (37) 

i 

Using the simple identity 

S(X P1 (1 - A)p 2 ) = \S( P i) + (1 - A)5( P2 ) + h 2 (X) (38) 

we find that 

S(£ Ai (p® n+1 ) = \S( £ Bi ( P ® n )) + (1 - Xi)S(£ci{p® n )) + h 2 (Xi). (39) 



Hence: 



1 jDm^^ 1 ))*^ 1 )) 



n + 

n II 



n + 



+ 4rE* r (4W)^(p)) 

n + 1 . 

> -4-i? 1 ( ( i>) + ^-i? 1 ((i 1 ) > fl!(d) (40) 

7i + 1 71+1 
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The equality comes from (H,@) and the fact that tr{£ Ai {p® n+l )) = tr{£\{p)), the first 
inequality comes from the inductive hypothesis, and the second inequality is a consequence of 
convexity of R\(d) and (|35|). Hence, the rate for {Ai} is greater than or equal to Ri(d) at the 
same distortion, as claimed. * 

Finally, it remains to show that for general n diagonal processes are optimal. This may be 
shown exactly in the case Po = \ due to its many simplifying features. We begin with two 
lemmas. 

Lemma 2 Given matrices {Yi} with J2i rf^i = I and positive D, the inequality J2i \t r (YiD)\ 2 < 
\tr(D)\ 2 holds. 



Proof We use the fact that D = V DD^ for D positive and employ the Cauchy-Schwartz 
inequality ( |l7|) to write 



J2\tr(YiD)\ 2 = J2\ tr (( Y i^D)^)\ 2 <^tr{YiDY})tr(D) = \tr{D)\ 2 (41) 

i i i 

The last equality comes from the cyclicity and linearity of trace. * 

Lemma 3 Given operators {Yi} acting on n qubits with J2i = I and positive D, diagonal 
in the B n basis, we have the inequality 

Y,tr(£ YlD (P® n ))de(p® n ,£Y lD ) > tr(£ D (p® n ))d e (p® n ,£ D ). 



Proof We again use induction. The n = 1 case follows from Lemma 2. Assuming the Lemma 
holds for n we prove it for n + 1. Consider 2 n+1 x 2 n+1 dimensional matrices {Yi}, and let 



with Ei etc. of dimension 2™ x 2". J2iY^Yi = I implies that 

Wi^i + GtGiW (43) 



and similarly for Fi and Hi. The restriction £y D oi Ey^d onto the last n qubits is described by 
the set {EiD , FjDi, GjDo, Hi-Di}- Then 

EM%)H^^) = E tr ( £ eMp® n ))de(p® n ,£ Ei D ) +tr(£ FiDl (p® n ))d e (p® n ,£ Fi 

i i 

+ tr{£ GiDo {p® n ))d e {p® n ,£ GiDo ) + tr(£ HiDl (p® n ))d e (p® n ,£ HiDl 

> Y. t < £ DM^)de{p^\£D )+tr{£ D M^)de( P ^ n ,£D 1 ) 

i 

= tr(£>(p® n ))d e ( P ® n ,£>) 
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The inequality comes from the inductive hypothesis and ([01) ■ Finally, this result is invariant 
under permutations of the qubits; averaging over all permutations yields 

Y,tr(£ YlD (P® n+1 ))d e (p® n+ \£Y tD ) > tr(£ D {p® n+l ))d e {p® n+ \£ D ) (45) 
This proves the Lemma. * 

Theorem 3 General n-qubit processes cannot perform below the single qubit entropy- distortion 
curve S\(d) in the case of isotropic sources (po = ^). 

Proof This is an immediate consequence of Lemma 3. We ignore the trace preserving condition 
for the time being and consider £ A {p® n ) = A(p® n )A^ . Then we use the polar decomposition A = 
UDV with U and V unitary and D diagonal positive. Using the fact that p = |J, it is easy to see 
that tr(£ A (p® n )) = tr(£ D (p® n )) , S(£ A (p® n )) = S(£ D (p® n )) and d e (p® n ,£ A ) = d e (p® n ,£ VUD ). 
Then from Lemma 3 with m = 1 and Y\ = VU, we get d e (p® n ,£ A ) > d e (p® n ,£d)- Therefore, 
there is a diagonal map that does at least as well as £ A . From a trivial variation on Theorem 2 
(note that the trace-preserving condition plays no role in the proof), this diagonal map cannot 
do better than the n = k = 1 curve S\(d) which is attainable for po = \. Having established 
that the optimal £ A yields the convex Si(d), using the same argument as in ( p3] ) we see that 
reintroducing the trace-preserving condition does not affect our result. Hence the Theorem is 
proved. * 

We conjecture that the theorem also holds for the case p$ > ^, and we now present some evidence 
to support this conjecture. It again suffices to show that diagonal processes are optimal for 
general n. 

• Consider perturbing a process defined by 2 n x 2™ dimensional diagonal matrices {A{\ with 
J2i-A\Ai = I by a general matrices {Qi} with diagonal elements all equal to zero. It is easy to 
see that to linear order the trace-preserving condition still holds, and both average entropy and 
distortion remain unchanged. Hence, all diagonal processes are local extrema with respect to 
off-diagonal perturbations. 

• In Theorem 2 we never used the fact that Bi and Cj were diagonal, so a more general class 
of operators given by (^), in B n or any other basis obtained by permutations of the qubits, lies 
above the R\(d) curve. 

• A straightforward modification of Theorem 3 shows that diagonal processes Di do better 
than UiDi, where Ui is any unitary operator (note that the trace preserving condition still holds). 

• By iterating the argument preceding Theorem 2, the restriction of a general n-qubit oper- 
ation onto a single qubit involves 2™" 1 operation elements which greatly increases the entropy 
exchange with the environment of that qubit. Essentially, individual qubits act as the envi- 
ronment for each other, and entangling them creates noise. On the other hand, as in classical 
information theory, the benefit of entangling (correlating) the qubits is a reduction in entropy 
since S(£(p® n )) < J2 a S{£ a (p) ) where £ a is the restriction of £ to the crth qubit. There is a 
competition between these two effects, and the former wins, as we have proven rigorously for 
Po = \. In this sense, however, there is nothing special about po = \. If anything, we would 
expect the entropy to be even harder to reduce via quantum operations for po > ^ than for 
Po = \ because it is lower to start with. 
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5 Physical realization of the R(d) curve 



We now elaborate on how our coding procedure may be realized physically. For the lossy part 
of the coding we need to provide an ancilla qubit in a definite state. We first apply a unitary 
transformation entangling the ancilla with the source qubit, and then measure the ancilla. In the 
basis {|0)a|0)q, 1 0)^4. 1 1) q , |1)a|0)q, |1)a|1)q}> the unitary transformation is given by the matrix 



U 



I cos a — sin a \ 

cos(a + A) — sin(a + A) 

sin a cos a 

\ sin(a + A) cos(a + A) J 



(46) 



with A 6 [0, 5] and a = a(A) as defined before. The ancilla is prepared in the \Q)a state so 
that the initial density operator for the ancilla-source system is 



(47) 




A lP A\ 



(48) 



Then 

UEtf = ( A ^ pA \ 

\ A 2 pA\ A 2 pA\ j 

where A\ and A 2 are as defined in (p6|). We then measure the ancilla qubit. If the outcome is 
|0)yi, we know the map p — > £a 1 (p) has been performed and we label the qubit as belonging to 
type 1. Similarly, if the outcome is \1)ai we know the map p — > £a 2 (p) has transpired and label 
the qubit to be of type 2. In the end we perform two Schumacher encodings, one on all the 
bits of the first type and a separate one on all the bits of the second type. When decoding, we 
need information about the sequence of operations performed. The rate of classical information 
required for this is r = fi2(tr(AipA\)). These classical rates are plotted for several values of po 
in Fig. 4. 



6 Discussion 

We have shown that for the distortion measure in question and when allowed an unrestricted 
classical side channel, optimum quantum rate-distortion codes are separable into a lossy part 
involving single qubit operations followed by standard Schumacher lossless coding of large blocks 
of qubits. 

Our result has the following interpretation: the rate-distortion curve is achieved by quantum 
operations that produce no entropy exchange with the environment of any individual qubit. 
We do not expect zero entropy exchange to be optimal for more general distortion measures. 
Since our distortion measure, which is based on the concept of entanglement fidelity, emphasizes 
preserving the state of RQ, it forbids any increase of the entropy of RQ which means it forbids 
any entropy exchange. We also do not believe n = 1 to be optimal when restrictions on r are 
imposed since, as remarked in Section 2, the entropy exchange is positive as long as there is 
uncertainty in the value of the index random variable /. 
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0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 

d 

Fig. 4. The classical information rate needed to decode vs. d for p = 0.5, 0.6, 0.7, 0.8 and 0.9. 

Let us examine the action of our quantum map on normalized pure states. If we picture 
|0) and |1) as orthogonal vectors then, depending on which of the two operations has been 
performed, the map rotates our pure state vector toward |0) or toward |1). The source is biased 
toward |0), which it produces with a higher probability than |1). The first type of operation 
produces qubits biased even more toward |0), hence causing a decrease in entropy The second 
type does the opposite and perhaps even increases the entropy for po > ~; however, it has to 
occur a certain fraction of the time in order to obey the trace-preserving condition, which says 
that the total probability of performing some operation must be equal to 1 regardless of the 
input state. On average, the entropy does decrease, while the discrepancy between the initial 
and final state increases. The R(d) curve is thus swept out. 

Notice that our quantum R(d) curve first falls to R = at d max = 2poPi> as opposed to the 
classical value dmax = Pi associated with reconstructing the source bit with the best guess at 
its value. This, too, is due to our choice of fidelity measure: replacing the original qubit with a 
fresh one prepared in the state |0) destroys the entanglement with the original reference system. 
The best we can do is project onto |0) with probability po and otherwise project onto |1). 

We do not expect a general expression resembling the classical prescription (|l|) for the rate- 
distortion function that is valid for all distortion measures to exist for quantum rate-distortion. 
Our reason for this lies in the richness of distortion measures which vary in their degree of 
" quantumness" . The one we have used based on entanglement fidelity evidently has a highly 
quantum nature. On the other hand, we could view p as being realized by a specific ensemble 
like Q = {(|0},po)> (|l)>Pi)}> an d use as our distortion measure the corresponding average pure 
state distortion measure d(Q n ,V^ o CM) based on the average pure state fidelity F(Q,£), 
namely 

F(Q,£) =p (0|£(|0)(0|)|0) + P1 <1|£(|1)<1|)|1) (49) 

Here we are able to attain zero distortion merely by sending classical information - the mea- 
surement results in the {|0), |1)} basis. If we do not allow storing classical information, then the 
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appropriate cross section of the rate-distortion manifold becomes the classical rate-distortion 
function for the Hamming measure, namely R(d,0) = S(p) — h,2(d). 

One could also investigate more general ensembles, as well as distortion measures tied to 
specific quantum cryptography protocols. Finally, the work presented here naturally generalizes 
to systems with more than two degrees of freedom. 



Acknowledgement We thank Konrad Banaszek, Howard Barnum, David Mermin, Ian Walm- 
sley and anonymous referees for valuable comments and particularly for pointing out problem 
formulation inadequacies in earlier versions of the manuscript. This research was supported in 
part by the DoD Multidisciplinary University Research Initiative (MURI) program administered 
by the Army Research Office under Grant DAAD19-99-1-0215 and NSF Grant CCR-9980616. 



References 

T. Berger, Rate Distortion Theory, Prentice Hall (1971) 

B. Schumacher, "Quantum coding", Phys.Rev.A 51, 2738 (1995); R. Jozsa and B. Schu- 
macher, "A new proof of the quantum noiseless coding theorem", J. Mod. 0ptics41, 2343 
(1994) 

S. Lloyd, "Capacity of the noisy quantum channel", Phys. Rev. A 55, 1613 (1996) 

H. Barnum, P.Hayden, R. Jozsa and A. Winter, "On the reversible extraction of classical 
information from a quantum source" , LANL preprint quant-ph/0011072 



B. Schumacher and M. A. Nielsen, "Quantum data processing and error correction", Phys. 
Rev. A 54, 2629 (1996) 

H. Barnum, "Quantum rate-distortion coding", Phys. Rev. A 62, 42309 (2000) 

B. Schumacher, "Sending entanglement through noisy quantum channels", Phys. Rev. A 
54, 2614 (1995) 

H. Barnum, M. A. Nielsen and B. Schumacher, "Information transmission through a noisy 
quantum channel", Phys. Rev. A 57, 4153 (1998) 

I. L. Chuang and D. S. Modha, "Reversible arithmetic coding for quantum data compres- 
sion", IEEE Trans. IT 46, 1104 (2000) 

A. Wehrl, "General properties of entropy", Rev.Mod.Phys 50, 221 (1978) 



16 



